Correlation dimension of complex networks 
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We propose a new measure to characterize the dimension of complex networks based on the ergodic 
theory of dynamical systems. This measure is derived from the correlation sum of a trajectory gen- 
erated by a random walker navigating the network, and extends the classical Grassberger-Procaccia 
algorithm to the context of complex networks. The method is validated with reliable results for both 
synthetic networks and real-world networks such as the world air-transportation network or urban 
networks, and provides a computationally fast way for estimating the dimensionality of networks 
which only relies on the local information provided by the walkers. 
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Network science has influenced the recent progress 
in many areas of statistical and nonlinear physics 
The discovery of the real architecture of interactions of 
many systems studied under the former disciplines |2|- 
Q{ changed the usual mean-field way to tackle problems 
arising in sociology;, biology, epidemiology and technol- 
ogy among others [5|. Furthermore, the blossom of the 
network theoretical machinery [6j], has provided a fore- 
front framework to interpret the relations encoded in 
large datasets of diverse nature and fostered the appli- 
cation of new techniques, such as community detection 
algorithms 0], to coarse-grain the complex and hierar- 
chical landscape of interactions of real- world systems. 

Recently, geometrical concepts have been exploited to 
describe and classify the structure of complex networks 
beyond purely topological aspects f&j-fiol]. In particular, 
the box-counting technique, widely used for estimating 
the capacity dimension Dq of an object, has been recently 
extended, as a box-covering algorithm, to characterize 
the dimensionality of complex networks [10Hl3l |. This 
technique proceeds by calculating the number N of boxes 
of Euclidean volume L d required to cover an object, being 
the capacity dimension Dq of such object given by Dq — 
lim£_j.o • ^ ne ca P ac ity dimension Dq is thus seen 

as an upper bound to the Hausdorff dimension. 

The box-covering approach, while being the most natu- 
ral and elegant extension of the concept of fractal dimen- 
sion to networks, suffers from some difficulties. First, 
in order to tile the network and to unambiguosly relate 
the box-covering and capacity dimensions, the object un- 
der study must be embedded in a metric space, some- 
thing that does not apply in the more general case of a 
complex network. This subtle problem can be overcome 
by restricting to spatially embedded complex networks 
[13| . A second important issue is the need of full knowl- 
edge of the network topology in order to perform the 
box-covering procedure. This constraint faces the limita- 
tions related to storing the complete network backbone, 



indeed, the computation of the capacity dimension be- 
comes unpractical for embedding dimensions larger than 
3 0. Finally, another related problem is that of finding 
the optimum covering, whose computational complexity 
is NP-hard [H. 

The above difficulties for calculating the capacity di- 
mension of a self-similar object are however circumvented 
in the dynamical systems literature by, instead, calculat- 
ing its correlation dimension [14j . Here we take advan- 
tage of this alternative characterization to compute the 
dimension of complex networks, relying on an extension 
of the Grassberger-Procaccia algorithm [13, EH . The key 
idea to extend this concept to the network realm is to 
generate random walkers surfing the network whose di- 
mension we want to estimate and to study their actual 
trajectories as time series. As a byproduct, the extension 
of this technique opens the door to the use of the the- 
oretical machinery inherited from the ergodic theory of 
dynamical systems in the characterization of the struc- 
ture of networks. 

Indeed, the study of the structure of networks relying 
on the theory of stochastic processes, such as random 
walks, has been successfully applied in the past for 
designing ranking algorithms, such as Google [16j, and 
unveiling the community structure [13] or the nature 
of degree-degree correlations [Til ] in complex networks. 
In our case, although random walkers are stochastic 
processes which have an underlying infinite-dimensional 
attractor, their trajectories are expected to evidence 
temporal correlations intimately related to the structure 
of the underlying network that confines their movement. 
Thus, in the case of self-similar correlations an associated 
dimension can be properly defined, yielding a reliable 
[l9| estimation of the underlying network's dimension. 
In the rest of the paper, after introducing the method, 
we present some results for both synthetic and real 
spatial networks [5(| and compare with those results 
obtained by means of box-covering techniques. 
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FIG. 1: (Left) Log-log plot of the correlation sum C m (r) as a function of similarity r, for a series of 4 • 10 4 data extracted from 
an unbiased random walker in a 2d lattice of 1000 nodes (with correlation dimension 2) where v = (v x ,v y ) and v x ,v y G [1, 1000], 
for different embedding dimensions m. There exists a scaling regime where the slope of the correlation sum approaches 2 for 
increasing values of m. (Right panel, bottom) Log-log plot of the correlation sum C m (r) as a function of similarity r, for a 
series of 10 4 data generated by a random walker over a fully connected network, for different values of the embedding dimension 



In this network each node is labeled with a real value v = v £ [0,1]. In all cases a scaling C m (r) 



,/3„ 



is found. (Right 



panel, top) Correlation exponent /3 m as a function of the embedding dimension m. 
with the embedding dimension m, what suggests an infinite-dimensional network. 



The correlation exponent increases linearly 



We start by introducing the method for estimating the 
correlation dimension in complex networks. Let Q be 
an undirected network with N nodes and L links so that 
each node i of Q is labeled with a generic vector v,;, where 
v G M. d , or G N d when the space is discrete. Consider 
a trajectory of length n generated by an ergodic ran- 
dom walker surfing the network Q, described by the se- 
ries {vi, V2, . . . , v„}. Note that in the case of spatially 
embedded networks, v, uniquely characterizes the posi- 
tion of node i in the underlying space. For instance, in a 
2-dimensional space, Vj = (v x ,v y ) T and the series reads 
{v x (l),Vy(l),v x (2),Vy(2),...,v w (n),Vy(n)}. This series 
is the object of study in order to describe the geometry 
and dimension of Q [2l[ and the first step is to apply 
embedding techniques to {v t }" =1 . Inspired by Taken's 
embedding theorem Q , we proceed to construct the sur- 
rogate vector-valued series {V(i)} where V(i) G R m d : 



V(z) 



(1) 



where m is the so called embedding dimension. Then, the 
correlation sum function C m (r) is defined as the fraction 
of pairs of vectors whose distance is smaller than some 
similarity scalar r G K [22j: 

2£^0(l|V(i)-V(j)||-r) 



C m {r) 



(n — m)(n — m + 1) 



where 0(x) is the Heaviside step function, and 

i i/p 



(2) 



is 



usually a p-norm ||x|| p = 



Without loss 



of generality, here we choose || ■ || as the L°° norm, 
|| x l|oo = max(|xi|, \x2\, ■ ■ ■ , \x n \), that induces the so 
called Chebyshev distance. Note that the use of the Eu- 
clidean norm was originally proposed in 15|, while the 
use of maximum norm was used by Takens in |25| . 

Based on arguments from ergodic theory [14J, |l5( , we 
conjecture that when the series is extracted from the tra- 
jectory of a walker surfing a network with well defined di- 
mension, for sufficiently long series and sufficiently small 
values of r, C m (r) evidences a scaling regime such that: 



lim lim 

r— ^0 n— >oc 



\og(C m (r)) 
log(r) 



(3) 



where /3 m — > f3 for sufficiently large embedding dimen- 
sion. Thus, /3 is the estimate of the correlation dimen- 
sion of the underlying space, here the complex network 
under study. Note that, in practice, the scaling regime 
is expected to appear only at an intermediate range 
rrj < r < n, where rrj is a lower cut-off due to poor 
statistics (noise regime) whereas r\ is an upper cut-off 
due to nonlinear effects (macroscopic regime) [H], [26| . 

In order to validate the above method, we first ad- 
dress a synthetic network that can be understood as a 
discrete limit of a smooth metric space with a well de- 
fined Hausdorff dimension. In the left panel of Fig. Q] 
we plot the correlation function C m (r) applied to a ran- 
dom walker on a 2-dimensional lattice, as this is a dis- 
cretized version of the Euclidean space R 2 , for different 
embedding dimensions m. In this lattice, each node is 
labeled by a 2-dimensional vector (v x ,v y ) (d = 2) where 
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FIG. 2: Log-log plot of the correlation sum Cm ( r) as a func- 
tion of similarity r, for a series of 2 • 10 4 data extracted from 
a random walker of 2 ■ 10 4 steps over the worldwide air trans- 
portation network (see the text), for increasing embedding 
dimensions m. The correlation exponent converges to /3 = 3. 



v x ,Vy £ [1,1000] are natural numbers. We find that 
C m (r) evidences a scaling region with /3 m — > 2, what 
suggests that the underlying network has a well defined 
dimension equal to 2, i.e., the Hausdorff dimension of the 
plane. 

As a further validation we address the case of a fully 
connected network, in which all the nodes are connected 
among each other, which is usually seen as the discrete 
version of an infinite-dimensional space. Note that a fully 
connected network does not have a natural spatial em- 
bedding and therefore, for the sake of simplicity, we label 
each node by a single real number v 6 [0, 1] (d = 1). In 
the right panel of the Fig. Q] we represent the correla- 
tion sum of the generated trajectory for different embed- 
ding dimensions m (bottom panel). In all the cases we 
find a clear scaling showing different slopes j3 m . In the 
top panel of the same figure we plot the estimated value 
fi m as a function of m, pointing out a linear dependence 
(3 m w m. This lack of convergence suggests that the 
underlying structure is infinite-dimensional, as expected. 

Once we have validated the method in synthetic net- 
works we tackle the characterization of real-world net- 
works. We first address the case of the global 



air- 



transportation network 27], as this is a paradigmatic 



spatially embedded network whose dimension has been 
recently claimed to be larger than two [l3|. This net- 
work is formed by N = 3618 nodes (the airports) and 
L = 13514 links denoting the commercial routes among 
them. As in the case of the 2-dimensional lattice we 
label each node i by a vector Vi — (xi,yi) (d = 2) 
that determines the coordinates of these airports, where 
XiiVi G [0, 1]. In Fig. [5] we show the results of C m (r) for 
a random walk trajectory of 2 ■ 10 4 steps, i.e., an original 
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FIG. 3: (A, B) Samples of San Joaquin (A) and Oldenburg 
(B) urban networks (see the text). The former is a recently 
founded city whose structure shows a top-bottom organiza- 
tion and a grid-like aspect, whereas the latter is a city that 
dates back to the twelve century and shows a self-organized 
shape without any evident symmetry. (C, D) Log-log plots of 
the correlation sum C m (r) as a function of similarity r, for a 
series of 4 • 10 4 data extracted from a random walker of 2 ■ 10 4 
steps in the San Joaquin and the Oldenburg urban networks 
respectively, for increasing embedding dimension m. Results 
suggest that only San Joaquin has a well defined dimension. 



series of 4 • 10 4 data. We find an intermediate regime 
where a scaling C m (r) ~ r /3m shows up, and (3 m — > (3 w 3 
for increasing values of the embedding dimension m. This 
value coincides with the box-covering dimension of the air 
transportation network, as suggested recently [l3| , point- 
ing out that, albeit embedded in 2-dimensional space, 
this network has a larger effective dimensionality. Fur- 
thermore, note that the random walk has a length of 
2 • 10 4 steps, thus revealing that it is possible to derive 
an accurate value of the network dimension with only a 
rather small amount of local information. 

To round off, we explore the dimension of urban net- 
works [28|, and address two paradigmatic cases of urban 
development: the case of San Joaquin county (Califor- 
nia, US), having N = 18623 nodes and L = 23874 edges, 
and that of Oldenburg (Germany), with N = 6105 nodes 
and L = 7035 edges (see the panels A and B in Fig. [3] 
for graphical illustrations). In both networks, each node 
is characterized by a 2-dimensional vector (x,y) where 
x,y G [0, 10000] (d = 2). Notice that San Joaquin is a 
recently founded city (1920) whose shape is the result 
of a planning process and, accordingly, displays a grid- 
like road structure. Conversely, Oldenburg (Germany) is 
an old city whose foundation dates back to the twelve 
century and whose road pattern is the result of a self- 
organized growth. In panel C and D of the same fig- 
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ure, we show their respective correlation sum functions. 
While the case of San Joaquin (panel C) evidences a 
scaling regime with a correlation dimension converging 
to 2, no scaling is found for the self-organized city of 
Oldenburg (panel D), suggesting that this latter network 
does not possess a well defined dimension. These differ- 
ent behaviors deepen on the recently observed structural 
differences between cities that have grown according to 
different evolutionary mechanisms [lo, EH . 

To conclude, in this work we propose an extension of 
the Grassberger-Procaccia method to estimate the corre- 
lation dimension of a complex network from the anal- 
ysis of the trajectories of random walkers on top of 
them. Although the original method was initially de- 
signed as a tool to retrieve the attractor dimension of 
low-dimensional chaotic dynamics, the presence of tem- 
poral correlations in stochastic dynamics (here induced 
by the geometry of the network) also produces similar be- 
haviors under this celebrated framework [26|, [2^]. Thus, 
in this work we deliberately exploit this property when 
using random walks as the trajectories under study. This 
probes the possibility of making use of concepts and 
tools from the ergodic theory of dynamical systems in 
the realm of complex networks. 

Our results suggest that the dimensionality of spatially 
embedded networks can be retrieved from this analysis. 
We highlight that the method only requires local infor- 
mation and it works with rather small time series. This 
constitutes an advantage for saving memory resources on 
one hand, and perhaps more importantly, it provides a 
way to make estimates about the dimension of a network 
without having global information of its structure. An 
example of such situation is the routing of information 
in the Internet, as it is easy to have access to the sequence 
of IP's a packet navigates through, while having access 
to the whole Internet map seems unfeasible. 

Further work should be done in order (i) to check in 
which situations this procedure can be performed, (ii) to 
relate the meaning of the exponent found in this work 
with other exponents recently defined in the network lit- 
erature, and (Hi) to extend this method to the study of 
generic networks beyond spatially embedded ones. 
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